Mining Compressing Patterns in a Data Stream
نویسندگان
چکیده
Mining patterns that compress the data well was shown to be an effective approach for extracting meaningful patterns and solving the redundancy issue in frequent pattern mining. Most of the existing works in the literature consider mining compressing patterns from a static database of itemsets or sequences. These approaches require multiple passes through the data and do not scale up with the size of data streams. In this paper, we study the problem of mining compressing sequential patterns from a data stream. We propose an approximate algorithm that needs only a single pass through the data and efficiently extracts a meaningful and non-redundant set of sequential patterns. Experiments on three synthetic and three real-world large-scale datasets show that our approach extracts meaningful compressing patterns as the state-of-the-art multi-pass algorithms proposed for static databases of sequences. Moreover, our approach scales linearly with the size of data streams while all the state-of-the-art algorithms do not.
منابع مشابه
CASW: Context Aware Sliding window for Frequent Itemset Mining over Data Streams
In recent years, advances in both hardware and software technologies coupled with high-speed data generation has led to data streams and data stream mining. Data generation has been much faster in data stream applications and scores of data is generated in quick turnaround time. Hence it becomes obvious to perform mining, data on arrival that is usually termed as data stream mining. General fre...
متن کاملMining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows
Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...
متن کاملMining Compressing Sequential Patterns
Compression based pattern mining has been successfully applied to many data mining tasks. We propose an approach based on the minimum description length principle to extract sequential patterns that compress a database of sequences well. We show that mining compressing patterns is NP-Hard and belongs to the class of inapproximable problems. We propose two heuristic algorithms to mining compress...
متن کاملMining Compressed Repetitive Gapped Sequential Patterns Efficiently
Mining frequent sequential patterns from sequence databases has been a central research topic in data mining and various efficient mining sequential patterns algorithms have been proposed and studied. Recently, in many problem domains (e.g, program execution traces), a novel sequential pattern mining research, called mining repetitive gapped sequential patterns, has attracted the attention of m...
متن کاملIncrementally Mining Recently Repeating Patterns over Data Streams
Repeating patterns represent temporal relations among data items, which could be used for data summarization and data prediction. More and more data of various applications is generated as a data stream. Based on time sensitive concern, mining repeating patterns from the whole history data sequence of a data stream does not extract the current trend of patterns in the stream. Therefore, the tra...
متن کامل